Attend Tuesday Zoom or watch video of Tuesday Zoom
Read book chapter
Optional review session on Zoom on Fridays
Complete all assignments by Monday at 11:59pm
Attendance
Synchronous students
MUST attend Tuesday Zoom sessions.
can also review the recordings
Asynchronous students
watch the Tuesday recordings
or attend some of the Tuesday Zoom sessions
or both
Failure to attend is a problem
Assignments
Due Mondays at 11:59pm
Policy on Failure to submit on time is a problem
This class is in transition
Taught for many years by Dr. Monica Gaddis
My second time teaching this class
Major changes
Software agnosticism
In class switch from SPSS to R
Suman Sahil will co-teach and cover programming in R
Discussion boards ask for feedback
Proposed exam questions
Why R
Faculty consensus
Prepares you better for capstone/thesis work
More likely to be used in your future job
Integrates well with team based tools
R is not as difficult as claimed
Work from existing templates
Python, SAS, Stata would have also been good choices
You are welcome to use these if you like
Student learning objectives, 1 of 3
SLO1
The graduate will be able to use statistics to analyze and interpret data. They will understand the fundamentals of the field in the context of recognizing the effective use of data or information for the specific discipline(s). They will select and apply appropiate statistical procedures to the information. They will be able to analyze and accurately interpret of statistical result.
Student learning objectives, 2 of 3
SLO2
The graduate will be able to design a testable research question or hypothesis. They will have adequate background knowledge about biological, biomedical, or population health contexts and problems including common research problems in order to generate a research question or hypothesis. They will be able to relate problems within and across levels of areas of the spectrum to bridge disciplines.
Student learning objectives, 3 of 3
SLO5
The graduate will be able to communicate scientific outcomes. This includes the ability to convey scientific methods and statistical findings, effectively field questions in an oral presentation format as well as in the preparation of thesis or capstone manuscripts.
Break #1
What you have learned
About this class
What’s coming next
R and RStudio
R
RStudio
Other statistical software
JMP, Python, R, SAS, SPSS, Stata
Use only if you are confident in your abilities
Avoid Microsoft Excel
Break #2
What you have learned
R and RStudio
What’s coming next
Programming assignments
Test
Break #3
What you have learned
Programming assignments
What’s coming next
Scales of measurement
Scales of measurement
Dichotomy
Continuous
Categorical
Stevens scales of measurement (controversial!)
Nominal
Ordinal
Interval
Ratio
Addition/subtraction not allowed for ordinal data
Mean of ordinal data is meaningless
An example of ordinal data.
“Do you agree or disagree with the following statements”
“I believe that knowledge of Statistics is important for my job.”
1 = Strongly disagree,
2 = Disagree
3 = Neutral
4 = Agree
5 = Strongly agree
Another example of ordinal data, course grades
A = 4
B = 3
C = 2
D = 1
F = 0
Break #4
What you have learned
Scales of measurement
What’s coming next
Tests of hypothesis
What is a population?
Population: a group that you wish to generalize your research results to. It is defined in terms of
Demography,
Geography,
Occupation,
Time,
Care requirements,
Diagnosis,
Or some combination of the above.
Example of a population
All infants born in the state of Missouri during the 1995 calendar year who have one or more visits to the Emergency room during their first year of life.
What is a sample?
Sample: subset of a population.
Random sample: every person has the same probability of being in the sample.
Biased sample: Some people have a decreased probability of being in the sample.
Always ask “who was left out?”
An example of a biased sample
A researcher wants to characterize illicit drug use in teenagers. She distributes a questionnaire to students attending a local public high school
(in the U.S. high school is grades 9-12, which is mostly students from ages 14 to 18.)
Explain how this sample is biased.
Who has a decreased or even zero probability of being selected.
Type your ideas in the chat box.
Fixing a biased sample
Redfine your population
Not all teenagers,
but those attending public high schools.
What is a parameter?
A parameter is a number computed from a population.
Examples
Average health care cost associated with the 29,637 children
Proportion of these 29,637 children who died in their first year of life.
Correlation between gestational age and number of ER visits of these 29,637 children.
Designated by Greek letters (\(\mu\), \(\pi\), \(\rho\))
What is a statistic?
A statistic is a number computed from a sample
Examples
Average health care cost associated with 100 children.
Proportion of these 100 children who died in their first year of life.
Correlation between gestational age and number of ER visits of these 100 children.
Designated by non-Greek letters (\(\bar{X}\), \(\hat{p}\), r).
What is Statistics?
Statistics
The use of information from a sample (a statistic) to make inferences about a population (a parameter)
Often a comparison of two populations
What is the null hypothesis?
The null hypothesis (\(H_0\)) is a statement about a parameter.
It implies no difference, no change, or no relationship.
Examples
\(H_0:\ \mu_1 - \mu_2 = 0\)
\(H_0:\ \pi_1 - \pi_2 = 0\)
\(H_0:\ \rho = 0\)
What is the alternative hypothesis?
The alternative hypothesis (\(H_1\) or \(H_a\)) implies a difference, change, or relationship.
Examples
\(H_1:\ \mu_1 - \mu_2 \ne 0\)
\(H_1:\ \pi_1 - \pi_2 \ne 0\)
\(H_1:\ \rho \ne 0\)
Hypothesis in English instead of Greek
Only statisticians like Greek letters
Translate to simple text
For two group comparisons
Safer, more effective
For regression models
Trend, association
Use PICO
P = patient population
I = intervention
C = control
O = outcome
Example of text hypotheses (1/2)
“… the objective of this 78-week randomised, placebo-controlled study was to determine whether treatment with nilvadipine sustained-release 8 mg, once a day, was effective and safe in slowing the rate of cognitive decline in patients with mild to moderate Alzheimer disease.”
Lawlor B, Segurado R, Kennelly S, et al. Nilvadipine in mild to moderate Alzheimer disease: A randomised controlled trial. PLoS Med. 2018; 15(9): e1002660. DOI: 10.1371/journal.pmed.1002660
PICO for this study
P = patients with mild to moderate Alzheimer disease
I = Nilvadine
C = placebo
O = cognitive function
Example of text hypotheses (2/2)
“… we investigated trends in BCC incidence over a span of 20 years and the associations between incident BCC and risk factors in a total population of 140,171 participants from 2 large US-based cohort studies: women in the Nurses’ Health Study (NHS; 1986–2006) and men in the Health Professionals’ Follow-up Study (HPFS; 1988–2006).”
Wu S, Han J, Li WQ, Li T, Qureshi AA. Basal-cell carcinoma incidence and associated risk factors in U.S. women and men. Am J Epidemiol. 2013; 178(6): 890–897. DOI: 10.1093/aje/kwt073
PICO for this study
P = female nurses/male health professionals
I = various risk factors
C = absence of various risk factors
O = presence/absence of BCC
One-sided alternatives
Examples
\(H_1:\ \mu_1 - \mu_2 \gt 0\)
\(H_1:\ \pi_1 - \pi_2 \gt 0\)
\(H_1:\ \rho \gt 0\)
Changes in only one direction expected
Changes in opposite direction uninteresting
Passive smoking controversy
EPA meta-analysis of passive smoking
Criticized for using a one-sided hypothesis
Samet JM, Burke TA. Turning science into junk: the tobacco industry and passive smoking. Am J Public Health. 2001;91(11):1742–1744.
What is a decision rule? (1/3)
Example
\(H_0:\ \mu_1 - \mu_2 = 0\)
\(H_1:\ \mu_1 - \mu_2 \ne 0\)
t = (\(\bar{X}_1-\bar{X}_2\)) / se
Accept \(H_0\) if t is close to zero.
What is a decision rule? (2/3)
Example
\(H_0:\ \pi_1 - \pi_2 = 0\)
\(H_1:\ \pi_1 - \pi_2 \ne 0\)
t = (\(\hat{p}_1-\hat{p}_2\)) / se
Accept \(H_0\) if t is close to zero.
What is a decision rule? (3/3)
Example
\(H_0:\ \rho = 0\)
\(H_1:\ \rho \ne 0\)
t = r / se
Accept \(H_0\) if t is close to zero.
What is a Type I error?
A Type I error is rejecting the null hypothesis when the null hypothesis is true
False positive
Example involving drug approval: a Type I error is allowing an ineffective drug onto the market.
\(\alpha\) = P[Type I error]
What is a Type II error?
A Type II error is accepting the null hypothesis when the null hypothesis is false.
False negative result
Usually computed at MCD
An example involving drug approval: a Type II error is keeping an effective drug off of the market.
\(\beta\) = P[Type II error]
Power = \(1-\beta\)
What is a p-value?
Let t =
(\(\bar{X}_1-\bar{X}_2\)) / se, or
(\(\hat{p}_1-\hat{p}_2\)) / se, or
r / se
p-value = Prob of sample result, t, or a result more extreme,
assuming the null hypothesis is true
Small p-value, reject \(H_0\)
Large p-value, accept \(H_0\)
Alternate interpretations
Consistency between the data and the null
Small value, inconsistent
Large value, consistent
Evidence against the null
Small, lots of evidence against the null
Large, little evidence against the null
What the p-value is not (1/2)
A p-value is NOT the probability that the null hypothesis is true.
P[t or more extreme | null] is different than
P[null | t or more extreme]
P[null] is nonsensical
\(\mu\), \(\pi\), or \(\rho\) are unknown constants (no sampling error)
What the p-value is not (2/2)
Not a measure FOR either hypothesis
Little evidence against the null \(\ne\) lots of evidence for the null
Not very informative if it is large
Need a power calculation, or
Narrow confidence interval
Not very helpful for huge data sets
A research paper computes a p-value of 0.45. How would you interpret this p-value?
Strong evidence for the null
Strong evidence for the alternative
Little or no evidence for the null
Little or no evidence for the alternative
More than one answer above is correct.
I do not know the answer.
Figure 1: xkcd cartoon about jelly beans and cancer
What is p-hacking?
Abuse of the hypothesis testing framework.
Run multiple tests on the same outcome
Test multiple outcome measures
Remove outliers and retest
Defenses against p-hacking
Bonferroni
Primary versus secondary
Published protocol
Break #5
What you have learned
Tests of hypothesis
What’s coming next
Confidence intervals
What is a confidence interval?
Range of plausible values
Tries to quantify uncertainty associated with the sampling process.
Example of a confidence interval
Homeopathic treatment of swelling after oral surgery
95% CI: -5.5 to 7.5 mm
Lokken P, Straumsheim PA, Tveiten D, Skjelbred P, Borchgrevink CF. Effect of homoeopathy on pain and other events after acute trauma: placebo controlled trial with bilateral oral surgery BMJ. 1995;310(6992):1439-1442.
Confidence interval interpretation (1 of 7)
Figure 2: Interval that contains the null value
Confidence interval interpretation (2 of 7)
Figure 3: Interval entirely above the null value
Confidence interval interpretation (3 of 7)
Figure 4: Interval entirely below the null value
Confidence interval interpretation (4 of 7)
Figure 5: Interval entirely inside the range of clinical indifference
Confidence interval interpretation (5 of 7)
Figure 6: Interval partly inside/outside range of clinical indifference
Quiz question, revisited
A research paper computes a confidence interval for a relative risk of 0.82 to 3.94. This confidence interval tells that the result is
statistically significant and clinically important.
not statistically significant, but is clinically important.
statistically significant, but not clinically important.
not statistically significant, and not clinically important.
The result is ambiguous.
I do not know the answer.
Confidence interval interpretation (6 of 7)
Figure 7: Confidence interval entirely inside the range of clinical indifference
Confidence interval interpretation (7 of 7)
Figure 8: Confidence interval entirely outside the range of clinical indifference
Why you might prefer a confidence interval
Provides same information as p-value,
Clinical importance
Distinguish between
definitive negative result, or
more research is needed
Break #6
What you have learned
Confidence intervals
What’s coming next
A simple R program
Grading rubric
{{< include ../../build-website/text/14/history-of-r.md}}